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Abstract 



The Kruskal Count is a card trick invented by Martin Kruskal in which a magician "guesses" 
a card selected by a subject according to a certain counting procedure. With high probability 
the magician can correctly "guess" the card. The success of the trick is based on a mathe- 
matical principle related to coupling models for Markov chains. This paper analyzes in detail 
two simplified variants of the trick and estimates the probability of success. The results are 
compared with simulation data for several variants of the actual trick. 
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1. Introduction 

The Kruskal Count is a card trick invented by Martin D. Kruskal (who is most well known 
for his work on solitons) which is described in Fulves and Gardner || and Gardner H,f7|. 
In this card trick a magician "guesses" one card in a deck of cards which is determined by 
a subject using a special counting procedure that we call Kruskal 's counting procedure. The 
magician can with high probability identify the correct card. 

The subject shuffles a deck of cards as many times as he likes. He mentally chooses a 
(secret) number between one and ten. Kruskal's counting procedure then goes as follows. The 
subject turns the cards of the deck face up one at a time, slowly, and places them in a pile. 
As he turns up each card he decreases his secret number by one and he continues to count this 
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way till he reaches zero. The card just turned up at the point when the count reaches zero is 
called the first key card and its value is called the first key number. Here the value of an Ace 
is one, face cards are assigned the value five, and all other cards take their numerical value. 
The subject now starts the count over, using the first key number to determine where to stop 
the count at the second key card. He continues in this fashion, obtaining successive key cards 
until the deck is exhausted. The last key card encountered, which we call the tapped card, is 
the card to be "guessed" by the magician. 

The Kruskal counting procedure for selecting the tapped card depends on the subject's 
secret number and the ordering of cards in the deck. The ordering is known to the magician 
because the cards are turned face up, but the subject's secret number is unknown. It appears 
impossible for the magician to know the subject's secret number. The mathematical basis of 
the trick is that for most orderings of the deck most secret numbers produce the same tapped 
card. For any given deck two different secret numbers produce two different sequences of key 
cards, but if the two sequences ever have a key card in common, then they coincide from that 
point on, and arrive at the same tapped card. The magician therefore selects his own secret 
number and carries out the Kruskal counting procedure for it while the subject does his own 
count. The magician's "guess" is his own tapped card. The Kruskal Count trick succeeds with 
high probability, but if it fails the magician must fall back on his own wits to entertain the 
audience. 

The problem of determining the probability of success of this trick leads to some interesting 
mathematical questions. We are concerned with the ensemble success probability averaged over 
all possible orderings of the deck (with the uniform distribution). Our objective in this paper 
is to estimate ensemble success probabilities for mathematical idealizations of such counting 
procedures. Then we numerically compare the ensemble success probabilities on a 52-card deck 
with that of the Kruskal Count trick itself. The success probability of the trick depends in part 
on the magician's strategy for choosing his own secret number. We show that the magician 
does best to always choose the first card in the deck as his first key card, i.e. to use secret 
number 1. 

The general mathematical problem we consider applies the Kruskal counting procedure to 
a deck of N labelled cards with each card label a positive integer, in which each card has its 
label drawn independently from some fixed probability distribution on the positive integers 
N + . We call such distributions i.i.d. deck distributions; they are specified by the probabilities 
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{tTj : j > 1} of a fixed card having value j. We assume that the subject chooses an initial secret 
number from an initial probability distribution on N + = {1, 2, 3, •■•}, and that the magician 
independently does the same from a possibly different initial probability distribution, and that 
thereafter each follows the Kruskal counting procedure. It is convenient to view the cards of 
the deck as turned over at unit times, so that the card in the M-th position is turned over at 
time M. If the M-th card is a key card for both magician and subject and no previous card 
is a key card for both, then we say that M is the coupling time for the sequences. Let t be a 
random variable denoting the coupling time on the resulting probability space with t = +oo if 
coupling does not occur. We wish to estimate the "failure probability" Prob[t > N}. 

The set of permutations of a fixed deck (with uniform distribution) does not have the 
i.i.d. property, and is not Markovian, but it can be reasonably well approximated by such 
a distribution. The advantage of the simplifying assumption of an i.i.d. deck distribution is 
that the random variable t can be interpreted as a stopping time for a coupling method for a 
Markov chain, as is explained in §2. 

The mathematical contents of the paper are determination of Prob(£ > n) for a geometric 
i.i.d. deck distribution, which is carried out in §3, and estimation of Prob(i > n) for a uniform 
i.i.d. deck distribution, which is carried out in §4. The proofs of several results stated in §4 
are given in an appendix. 

In §5 we consider the actual Kruskal count trick, and compare its success probability 
with the approximations given by the models above. Because the Kruskal count trick using 
an actual deck of 52 cards involves a stochastic process that is not Markovian, we estimate 
the success probability by Monte Carlo simulation. We consider the effect on this success 
probability of varying the magician's strategy for choosing his key card, and of varying the 
value assigned to face cards. The magician should choose his key card value to be 1. Assuming 
this strategy for the magician, the success probability of the original Kruskal Count trick is 
just over 85%. Both the i.i.d. geometric distribution and i.i.d. uniform distribution models 
above give good approximations; the geometric distribution is off by less than 3%, and the 
uniform approximation is within 1%. 

There has been some previous work on mathematical models of the Kruskal count. In 
1975 Mallows |yj determined the expected value of the coupling time of i.i.d. sequences, 
and observed that especially simple formulae occur for the geometric distribution. Recently 
Haga and Robins Q analyzed a simplified Markov chain model for the Kruskal count, which 
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is related to, but not the same as, the models considered here. We discuss their model further 
at the end of §4. 

2. Coupling Methods for Markov Chains 

The coupling time random variable t is a special case of a stopping time random vari- 
able t* associated with a coupling method for studying a Markov chain. This motivates our 
terminology. 

To explain this connection, consider a homogeneous Markov chain {X n : n > 0) on a 
countable discrete state space S. Given two initial probability distributions p and p' on S a 
coupling method constructs a bivariate process (X^, X%) consisting of two copies of process 
X n with Xq having distribution p, Xq having distribution p', and the two copies evolve 
independently until some (random) stopping time t* at which Xj, = Xf* and then requires 
them to be equal thereafter, evolving as a single process X n . The stopping time t* is not 
necessarily required to be the first time t at which X\ = Xf occurs, and the particular rule for 
choosing t* defines the coupling method. Let /j, n , fi' n denote the distribution at time n of the 
process X n stating from the distribution p, p' respectively, at time 0, and let the variation 
distance ||p — p'|| between two distributions on S be 

Hp - p'll : = bO) -p'(s)\ . 

ses 

The basic coupling inequality is 

<Prob[i* >n] . (2.1) 

Such inequalities can be used to prove ergodicity of a Markov chain and to bound the speed 
of convergence to the equilibrium distribution, by bounding the right side of the inequality. 

The first coupling method was invented by Doeblin ||, and many other coupling methods 
have been proposed since, see Griffeath || for a survey. Applications to card shuffling and 
random walks on groups are described in Aldous and Diaconis H and Diaconis p|. The ba- 
sic coupling inequality $2^) is also valid for non-ergodic Markov chains, e.g. null-recurrent or 
transient Markov chains on the state space N, as was observed by Pitman |l2j . Coupling meth- 
ods are traditionally used as an auxiliary device to get information on the rate of convergence 
to equilibrium of an ergodic Markov chain. In this paper, we are interested in obtaining upper 
and lower bounds for the coupling probability itself, since it represents the failure probability 
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of the Kruskal Count trick. We do not use the basic coupling inequality, but instead in §4 use 
inequalities relating coupling probabilities for various different Markov chains. 

For an i.i.d. deck the Kruskal counting procedure can be viewed as moving on a Markov 
chain Mtt on the state space N where a state j represents a current value of the Kruskal 
counting procedure, with state representing being at a key card, and state j represents that 
the next key card be reached after exactly j more cards are turned over. Each transition of the 
Markov chain will correspond to turning over one card in the deck. Let the random variable 
X n denote the state of the Markov chain at time n; it indicates the current Kruskal count value 
at location n of the deck, except that X n = indicates a key card at location n. The transition 
probability for this chain from state j > 1 is probability 1 to state j — 1 and to all other 
states, and from state to state j is probability vr J+ i, where {ttj : j > 1} is the distribution 
7T of card labels. (That is, tt\ is the probability that the key card has value 1, and the chain 
transitions from state to state 0.) The initial distribution of secret numbers are distributions 
p, p' on the state space N. We define the random variable t = t(p, p') to be the stopping 
time associated to the coupling method that combines the chains X\ and X% at the first time 
that X\ = X% = 0. (This is not necessarily the first time that X\ = X%.) The basic coupling 
inequality ( |2.1| ) for M.^ and t then gives 

<Prob[t>n] , (2.2) 

where \i n and ^i' n are the n-step state probabilities for the chain M n started with initial 
distributions p and p'. We note that the Markov chain M n is ergodic if E[tt] = Yl'jL\j' !T i 
is finite, and is null-recurrent otherwise. In the ergodic case the stationary distribution tt = 
(#0, TTi, 7T 2 , • ••) is given by 

%j = (1 - 7T1 - 7T 2 7Tj)(l + Eln})- 1 

for j > 0. This chain is ergodic for the deck distributions that we consider, and our object is 
to estimate the "failure probability" Probfi > n]. 

In the remainder of the paper, rather than considering Markov chains of the type M n , we 
study simplified Markov chains that jump from one key card to the next, but which retain 
enough information for coupling methods to apply. 
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3. Geometric Distribution 

We consider an idealized deck consisting of cards whose labels are independently and iden- 
tically distributed random variables drawn from N + = {1, 2, 3, • • •} with the geometric 
distribution Q p given by -k^ = (1 — p)p k ~ 1 , < p < 1. The geometric distribution has mean 

oo ^ 

E\p] = V kir k = - . (3.1) 

Let Qn{p) denote the deck distribution induced on a deck of N cards. 

Assume that the magician and subject both pick a secret number drawn from the same 
geometric distribution Q p . Let Prob[i > N] denote the probability (choosing a deck of cards 
at random as above) that the magician and subject have no common key card in positions 1 
through N. 

For the geometric deck distribution there is a simple exact formula for all coupling proba- 
bilities. 



Theorem 3.1. For the geometric deck distribution Gn(p) with initial geometric value distri- 
butions Q p , 

Probfi > N] = p N (2-p) N . (3.2) 

Proof. We use the memorylessness property of the geometric distribution, which is that for a 
^-distributed variable X the conditional probability Prob[n = k \ u > I] satisfies 

Prob[u = k | u > i] = Probfn = k - £] . (3.3) 

By direct computation 

Prob[t > 1] = 1 - (1 - pf = p(2 - p) . 

Now for N > 2, 

Prob[£ > N] = Probfi > iV^ 1 > 2 and X\ > 2} Probp^ 1 > 2 and X\ > 2] 

+Prob[t > N\Xl = 1 and X\ > 2] Prob^ 1 = 1 and X\ > 2] (3.4) 
+Prob[i > N\ X{ > 2 and X\ = 1] Prob^ 1 > 2 and X\ = 1] 
+Prob[£ > N\ X{ = 1 and X\ = 1] Prob^ 1 = 1 and X\ = 1], 

in which the last condition X\ = X\ = 1 has zero probability for N > 2. Now by (3.3) 

Probft > N\Xl > 2 and X\ > 2] = Probft > N - 1] . 



In the second case X| — 1 is geometrically distributed, hence by ([T^) again 
Prob[t > N\Xl = 1 and Xf > 2] = Prob[t > N - 1] . 



The same holds for the third case, so (3.4) becomes 

Prob[£ > N] = Probft > N — l]Prob[max(X 1 1 , X x 2 ) > 2] 
= p(2-p)Prob[t >iV-l] . 

The theorem follows. ■ 

For the geometric distribution the magician can improve his chances by always selecting 
the first card. Let t' denote the coupling time for this process where the subject draws his 
secret number from Q p . Then one finds by a similar calculation that 

Prob[t' >N]= p{p{2 - p))"- 1 = p N (2 - p) N - x , (3.5) 



which is smaller than (3.2) by a factor 1/(2 — p) 



4. Uniform Distribution 

Consider a deck of N cards having a uniform i.i.d. distribution of card values drawn from 
[1, B]. We estimate Prob[i > N] where t is the coupling time assuming that both the magician 
and the subject draw a secret value uniformly from [1, B]. 

For our analysis we introduce two auxiliary finite state Markov chains. The first of these is 
a chain Cb that we call the leapfrog chain. View the subject and magician as performing the 
Kruskal counting procedure on two independently drawn decks. The subject will use a white 
pebble to mark the location of key cards and the magician will use a black pebble, according 
to their decks, and simultaneously each moves to their respective first key card. After this 
is done, the person having his pebble furthest behind in the deck moves it to his next key 
card. In case of a tie, where both pebbles are in the same relative position in the deck, a 
move consists of both persons simultaneously moving their pebbles to their next key cards, 
respectively. (Since the players have separate decks, the next key card values of the two players 
need not be the same.) The states of the chain Cb represent the distance the white pebble is 
currently ahead of or behind the black pebble in the card numbering, so there are 2B — 1 states 
i with —(.B — l) < i < B — l. A transition occurs whenever a pebble is moved; a transition from 
state corresponds to both pebbles moving (independently), while a transition from any other 
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Table 4.1:Leapfrog Chain £4 



state corresponds to exactly one pebble being moved. A transition often involves one pebble 
leapfrogging over the other, hence the choice of name for Lb- The transition probabilities pij 
are determined by the uniform distribution on card values. For i ^ the transition from i to 
j is determined by the value v of the key card by 

v = sign(«)(i - j) , (4.1) 

so that 

( 1 

— if 1 < sign(«)(i -j) < B , 
Pij = < (4.2a) 

otherwise , 

while for i = the transition probabilities are 

*i ■= POJ = • (4.2b) 

This chain is ergodic, and it is easy to check that Hj in (4.2b) gives the stationary distribution 
for Lb- Table 4.1 gives the state transition matrix [pjj] for £4. 

Now consider the case that the subject and magician perform the Kruskal counting pro- 
cedure on the same deck. As long as their sequences of key cards remain disjoint, these key 
card values are independent random variables, and their relative positions of current key cards 
are described by transitions of the leapfrog chain. This persists until they have a key card in 
common, i.e. until the state is reached on the leapfrog chain. Thus Probft > N] corresponds 
to the probability of those sequences of transitions in the leapfrog chain starting from that 
avoid the state until one pebble has moved to a position beyond N. We can keep track of 
sequences that never visit by forming the reduced leapfrog chain Lb obtained by deleting the 
state and assigning new transition probabilities 

Pij := (l-Pio^Pij ■ (4.3) 
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Table 4.2: Reduced Leapfrog Chain £4 



For Lb the probability of going to is a constant, hence 



Pij = { 1 -b) Pii ' (44) 

so that all values Pij are either -g^- or 0. Table 4.2 gives the state transition probabilities [Pij] 
for £4. 

The initial state distribution on the reduced leapfrog chain Lb corresponds to that after 
one transition of the leapfrog chain from the state, conditioned on not staying at 0, which is 

^:=(l-|) forl<b'|<i?-l. (4.5) 

This chain is ergodic and has Ttj as its stationary distribution. 

We next define a random variable t*^ B which counts the total number of key cards produced 
during the Kruskal count by the subject and magician, up to and including the first key card 
that occupies a position exceeding N. We call t*^ B the travel time beyond position N. To 
determine the travel time, we require as additional data the position i of the top key card, 
which we define to be that key card which is closest to the top of the deck. Given that the 
initial state of the chain is in state j the conditional probability that the top key card is in 
position i is 

r v = B=]j\ l < i < B ~\j\i ( 4 - 6 ) 
and is otherwise. The position of the top key card together with the sequences of successive 
states of Lb allow the reconstruction of all moves during the Kruskal count, and the determi- 
nation of the travel time t*j$ B . 
Lemma 4.1. If N > B then 

N ( lV" 1 

Prob[t > N] = 1 - o Prob[^* B = j] . (4.7) 

9 



Proof. The event [t > N] corresponds to all sequences of state transitions in Lb starting 
at state that never return to before some pebble moves to a position > N + 1. Such a 
sequence of transitions is matched (after the first move) by corresponding state transitions in 
Lb- The probabilities between Lb and Lb differ by a multiplicative factor (l — -^). There is 
one less factor of fl — -g) than t*£ B counts because the initial state of Lb counts as two key 
cards, but corresponds to only one transition in Lb- ■ 

Lemma 4.1 is useful because the distribution of the travel time B is strongly peaked and 
relatively tractable to estimate. Since no move of a pebble is larger than B, and since both 
pebbles are within B cards of the iV-th card at the stopping time t*^ g , one has 

2N 



Lemma 4.1 then yields 



t% B > ^ - 1 . (4. 



2N r, 

Prol>[/ > X] < ( 1- ij B . (4.9) 



This shows the (well-known) fact that Prob[i > N] decreases exponentially as a function of N. 

Using large-deviation theory we can obtain the asymptotic behavior of Prob[t > iV] as 
N oo. 

Theorem 4.1. For fixed B there is a positive constant as such that 

Probft > N} = exp(-a B N(l + o(l))) (4.10) 

QS N — ► oo. 

We relegate the proof of this result to the appendix, where we also give a variational formula 
for as- We easily obtain from ( ^4.9[ ) the inequality 



«B>(|)|log(l-i)| = -^ + 0(i). (4.11) 
It is intuitively clear that the expected value of a key card is > §■ in all states, hence one 



expects that Prob 



J N,B — B 



> 2> which with Lemma 4.1 would imply that as < "^ + 0(-^j). 
Theorem 4.2 below shows that ^ — > 4 as B — > oo, see ( ^.19|) . 

We next obtain upper and lower bounds for Prob[t^ B > k] by approximating the reduced 
leapfrog chain Lb with two simpler Markov chains L^ and L B , as follows. These chains both 
describe the leapfrog motion of two colored pebbles at most B units apart, with the states 
representing the current distance the white pebble is ahead. 
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(i) In C B the pebble further behind jumps v units with v drawn uniformly from the 
range [1, B — 1]. 

(ii) In C B the pebble further behind jumps v units with v drawn uniformly from the 
range [2, B]. 

The chain C B is exactly like the leapfrog chain Cb-i except that in state only the white 
pebble jumps. The chain C B has 2B — 1 states labelled by \i\ < B — 1, while the chain C B has 
2B + 1 states labelled by \i\ < B. Both chains have the property that the card values drawn 
are independent of the current state. For the chain C B we define a travel time tjj B beyond 
position N, which is obtained by starting the chain in state 0, with both pebbles in position 0, 
associating a movement of pebbles on a line with each state transition, and counting the total 
number of state transitions up to and including the first time that a pebble is moved beyond 
position N. For the chain £ B we define a travel time B beyond position N similarly. 
Lemma 4.2. For all N, B and k, one has 

Prob[t+ B > k] > Prob[^* B > k] > Prob[i^ B > k] . (4.12) 

We give the proof of Lemma 4.2 in the appendix. Lemmas 4.1 and 4.2 when combined 
yield the bounds 

P^ B > Prob[t > N] > P- B (4.13) 

where 

N ( 

j=l V / 

The simple form of the chains £ B and C B allows the asymptotic behavior of B and B 
to be explicitly determined, as follows. 
Theorem 4.2. For fixed B as N — > oo one has 

P± B =exp(-a|iV(l + (l))) (4.14) 

where ^a B is the unique root a of 

B-l 

exp(ia) = B , (4.15) 

i=l 

and ^a B is the unique root of 

B-l 

exp((i + l)a) = B . (4.16) 



i=i 
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As B — > oo these quantities satisfy 

« + B = ^- 2 -W + 0(B-*) (4.17) 



a B = ^ + A ~^ + 0{B-% (4.18) 



The proof of this result is given in the appendix. Theorem 4.1 together with the inequalities 
( |4.13 ) shows that for large B one has 



Prob[t > N] = exp f - f + O ( -p I I (1 + o(l))N \ (4.19) 

as N — > oo. 

We relate these results to the model of Haga and Robins ||. The Markov chain studied 
by Haga and Robins is obtained from the leapfrog chain by identifying states k and — k for all 
k > 1; thus it has exactly 5 states. The resulting chain factors out the action of the involution 
sending k to — k under which the chain probabilities are invariant, and this loses the "leapfrog" 
information which is necessary for computing exact coupling probabilities. Haga and Robins 
estimate instead the probability of avoiding absorption in the absorbing state in the first M 
transitions of the resulting factor chain. This probability asymptotically decays like 0((\b) M ) 
as M — > oo, where As is the modulus of the second largest eigenvalue of the characteristic 
polynomial of their Markov chain. The characteristic polynomial of the transition matrix of 
the Haga-Robins Markov chain is Pb{x) ■= (x + 1/B) B — (1 + l/B) B x B ~ 1 , and it can be shown 
that the modulus of its second largest eigenvalue satisfies 

A B = l-| + 0(-^) (4.20) 

as B — > oo. To relate to the asymptotic coupling probability decay rate exp(— ag) in 
Theorem 4.1, we note that the expected size of a step in the Haga-Robins chain is about B/2, 
so that after M steps the location of the chain should be around the position N ~ MB/2. One 
should therefore compare (Xb) 2 ^ B and exp(-aB), and one finds that both of these quantities 
are asymptotic to 1 — + 0(-^) as B — > oo, using ( 4.20Q and Theorem 4.2. 
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5. Numerical Results: The Kruskal Count 

We compare predictions obtained from the two models studied in this paper with the perfor- 
mance of the actual Kruskal Count trick. 

For the actual Kruskal count we consider a standard deck of 52 cards, and we assume that 
the subject draws a key card using a uniform distribution from the set of available key card 
values. We study the effects of varying the magician's strategy on the success probability of 
the Kruskal Count trick. The magician has the freedom to choose his key card, and he has 
also has the extra freedom to specify a rule for assigning values to the "face cards" J, Q, K. 
We study three possible rules variations: 

(a) Assign the values 11, 12, 13 to J, Q, K, respectively. 

(b) Assign the value 10 to each of J, Q, K, 

(c) Assign the value 5 to each of J, Q, K. 

The first two of these rules variations are presented as "straw men" useful for comparison with 
the models of this paper. To obtain numerical values for the Kruskal count trick we used 
a Monte Carlo simulation with 10 6 trials for each data point. For simulations of the i.i.d. 
uniform deck distribution, an "exact" calculation was done using an enlarged Markov chain 
which kept a running total of the value of the position N of the leading pebble, and with an 
absorbing state whenever a pebble jumps past the end of the deck. Since the smallest step 
size is 1 , this chain reaches an absorbing state after a number of steps equal to the size of the 
deck; consequently, we it suffices to compute the state of the chain after that number of steps. 
Simulations of the i.i.d. "semiuniform" distributions for rules variations (b) and (c) were done 
similarly to the i.i.d. uniform case. 

The rules variation (a) corresponds to the uniform distribution on {1, 2, 13}. The av- 
erage key card size is 7. We therefore consider as an approximation the i.i.d. geometric deck 
distribution with p = |, which has mean key card size 7. According to Theorem 3.1 the fail- 
ure probability foriV = 52 the magician drawing his first key card according to the geometric 
distribution is 

FP(G a ) = (^) 52 (^) 52 = 0.342254. (5.1) 
If the magician chooses the first card to be his first key card, by (3.5) his failure probability 
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Table 5.1: Failure probabilities for rules variation (a) 



for N = 52 is 

FP(g' a ) = 7 -FP{g a ) = 0.299472 (5.2) 

o 

Table 5.1 presents data for rules variation (a) for the Kruskal Count and the i.i.d. uniform 
deck distribution on {1, 2, 13}. The table gives failure probabilities in which the magician's 
strategy is to choose as first key card the j-th card, for 1 < j < 13. plus a final row that gives 
the failure probability when the magician draws a card uniformly in {1, 2, ...13}. The data in 
Table 5.1 show that the magician does best to choose j = 1 as his key card. The non-Markovian 
nature of the actual deck causes the failure probabilities to differ from the i.i.d. uniform deck 
distribution; the effect is a decrease of about 0.3%. We also see that the failure probability for 
the i.i.d. geometric distribution is an overestimate of the failure probability for the Kruskal 
Count when the magician picks a random card as first key card, and underestimates the failure 
probability when the magician picks the first card as key card. 

We next consider the rules variations (b) and (c). For rules variation (6) the expected key 
card size is f|, so for comparison we consider the i.i.d. geometric deck distribution with p = ||. 
If the magician chooses his first key card according to the same geometric distribution, then 
the failure probability is 

72 Q8 

FP(Q b ) = (-) 52 (-) 52 = 0.292064, (5.3) 
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Kruskal (b) 


semiuniform (b) 


Kruskal (c) 


semiuniform (c) 


uniform 


1 


0.277869 


0.284060 


0.146238 


0.152658 


0.150944 


2 


0.280756 


0.287235 


0.148801 


0.155266 


0.153684 


3 


0.284330 


0.290447 


0.151204 


0.157847 


0.156407 


4 


0.287163 


0.293623 


0.153736 


0.160399 


0.159109 


5 


0.290317 


0.296782 


0.156075 


0.162918 


0.161789 


6 


0.293557 


0.299920 


0.159744 


0.166357 


0.164444 


7 


0.296910 


0.303034 


0.162474 


0.168973 


0.167070 


8 


0.300023 


0.306118 


0.164977 


0.171553 


0.169665 


9 


0.303194 


0.309171 


0.167735 


0.174094 


0.172225 


10 


0.306383 


0.312185 


0.170064 


0.176591 


0.174747 


avg 


0.292050 


0.298258 


0.158105 


0.164666 


0.163008 



Table 5.2: Failure probabilities for rules variations (b) and (c) 



while if the magician draws the first card as his key card, then 

85 

FP (S'b) = —FP^b) = 0.253320. (5.4) 
98 

For rules variation (c) the expected key card size is so for comparison we consider the i.i.d. 
geometric deck distribution with p = If the magician chooses his first key card with the 
same geometric distribution, then the failure probability is 

FP(G C ) = (^) 52 (^) 52 = 0.161197, (5.5) 

while if the magician chooses the first card as his key card, the failure probability is 

70 

FP(Q'c) = ^FP{Q C ) = 0.135949. (5.6) 
oo 

Table 5.2 presents failure probability data for rules variations (6) and (c) for the Kruskal 
Count and for the i.i.d. semiuniform deck distributions which have the card values {1, 2, 10} 
chosen with the same probabilities as rules variations (b) and (c) impose on the actual deck. 
The non-Markovian nature of the actual deck results in the Kruskal count failure probabilities 
differing from the corresponding i.i.d. deck distributions; they are smaller by about 0.6%. The 
failure probability for the i.i.d. geometric distribution when the magician chooses the first card 
as first key card gives an underestimate for the failure probabilities of the Kruskal Count in 
rules variations (b) and (c). The numerical results show that the magician should choose the 
first card as his key card. The effect of the choice of the magician's key card on the failure 
probability is small, at most 2.5%. In comparing rules variations (6) and (c) we see that the 
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choice to have face cards take the value 5 rather than 10 has a much larger effect on the failure 
probability than the magician's choice of first key card position. The final column of Table 5.2 
presents the failure probabilities for the i.i.d. uniform deck distribution on {1, 2, 10}. One 
expects this i.i.d. uniform distribution to be comparable with rules variation (c) rather than 
(b), because the expected key card size is similar to case (c). (The Kruskal Count (c) mean 
value is slightly lower.) 

To conclude: The rules variation to count face cards as having value 5 rather than 10 is 
important to the success of the Kruskal Count trick in practice; the choice of the first card as 
key card offers a further small improvement in success probability. 

6. Appendix: Proofs of Theorem 4.1, Lemma 4.2 and Theorem 4.2 

Proof of Theorem 4.1. In view of Lemma 4.1, one has 

M N < Prob[t < N] < NM N (6.1) 

where 



Mj\f : = max 

Kk<N 



Prob[t- B = A;]j . (6.2) 



The maximum will occur with k ~ 7^ for some constant 7 = "f(N, B). To estimate 7, we 
note that the travel time t*^ B beyond position N depends on the successive transitions of 
the chain Lb- We convert this to a problem about successive states of the jump chain L B 
having 2B{B — 1) states which correspond to all possible transitions of the chain Lb- A jump 
chain state (i, v) will mean state i of Lb together with an allowable key card value v which 
determines the next state of Lb- The allowable values are 1 < v < B with v 7^ \i\. The 
transition probability P SjS i from s = (i, v) to s' = (j, v') is -g^j when j is uniquely determined 



by (4.1) and 1 < v' < B with v' 7^ j, and is otherwise. 

We let {(ifc, Vk) ■ fc + 1, 2, • • ■} denote a sequence of states of L B , and introduce the modified 
travel time 

i N;B ■■= min{A; : v x + . . . + v k > 2N} . (6.3) 

Now Prob[tN t B < 7-^] can be estimated using large deviation theory, using the following special 
case of Theorem 1 of Donsker and Varadhan [Q] . 
Theorem A.l. For fixed B and 7 one has as N — > 00, 

Probes < 7 AH = exp(-/ B ( 7 )iV + o{N)) (6.4) 
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where the function = 7^*(t) where 



/*( 7 ) := M{I(ji) : weight^) > -} 

7 



(6.5) 



.ffere ^ runs over the set of probability measures on the state space S of the chain C J B , and 



wci 



s=(i, v)eS 



is the expected card value for the measure fx, and 



:= -inf jglog (^^) /^(«) : n : 5 ^ M+| , 



(6.6) 



where ir u (s) := J2 s , eS p s ,s'u(s'). 
It is easy to show that 

% R S tN,B S t N B + £> , 



and this yields 



Prob[tjv,B < fe] < Probftjvs < k \ < Prob[ijv,B < k — B] . 



(6.7) 



Using Theorem A.l and ( |6.7| ) we see that the quantities t*£ B and ijy,B have the same asymptotic 
behavior, with 

Prob[C B < jN] = exp(-/ B ( 7 )iV + o(N)) (6.8) 



as N — > oo. Now ([T^) leads us to define 



ttg := max 



7 log 1 



B 



/b(7) 



(6.9) 



Using the fact that /b(7) is a strictly convex function, it can be shown there is a unique value 
of 7 attaining the maximum on the right-hand side. With some further work this fact and 
Q, Q and dJ) imply that 

M N = exp(-a B N + o(N)) 



as N — > co. The bound of Theorem 4.1 follows. 



Proof of Lemma 4.2. We exhibit a 1 — 1 correspondence between a sequence of states 
and admissible transitions for the three chains, corresponding to moving two colored pebbles 
on the line {n : n > 0} starting with both at zero. 
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We compare C B and Cb- The position after two pebble moves of C B corresponds to an 
initial position for Cb together with the top key card and its color. Let q~j, qij denote the 
associated probability distributions of the locations (i, j) of the white and black pebbles for 
the chains Cb, £-~b respectively, which are: q~j + ^-i)' 2 ^ or ^ — *' 3 — B — 1 and otherwise, 
qij + B ^_^ for 1 < i, j < B with i ^ j and otherwise. 

Claim. There is a mapping <§r of the probability mass qij on (i, j) for Cb to various (i',j r ) 
having i' < i and j' < j whose image is the distribution q^j . 

Let (i, j) y (i' , j') mean min(i,j) > min(i',j / ) and max(z,j) > m&x(i' , j'), i.e. the pebbles 
(i, j) are both moved further along the line then (i 1 , j'), ignoring their colors. Assuming the 
claim to be true for the moment, we have a (stochastic) pairing of pebble positions (i, j) — * 
{i! ', j') such that (i, j) y (i 1 , j') between Cb and C B . For each subsequent move, both 
claims have B — 1 possible transitions with probabilities each rzt- For Cb in state k + i — j 
the admissible value of the next move is {1, 2, ... ,B} — {\k\}. We map these transitions 
to transitions of C B in linear order, with a mapping ipi^i having i/j\k\(i) = i for i < \k\ and 
+ i — 1 for i > \k\ + 1. One easily sees that if pebbles in Cb are at (ii, ji) and the 
corresponding ones are at (12, fa) with (i\, fa) y (£2, fa), and if the pebble closer to the 
origin is moved i resp. <p\k\{i) for the two claims resulting in positions (i*, j*), (i 2 , j|) t nen 
(**) 3*) ^~ (*2> ^2)- This gives a stochastic pairing of pebble positions at all subsequent moves, 
with both pebbles of C B always being behind those of Cb in the ordering y. Consequently 

Prob[t^ B > t] > Prob[t~ B > t] (6.10) 



for all t, which is the right side of ( |4.12 ) 



It remains to prove the claim. Here we remark if qij{n) and q^ln) denote the probability 
distribution of pebble locations of Cb and C B after n moves then 

Prob[t£ B >t] = l-J>0-(*) 

i<N 

j<N 

with a similar formula for Prob[i ^ B > t]. The proof above actually establishes the majorization 
inequalities 

2 ^(*) - Yl ' a11 *° > i'Jo > 1 , (6.11) 

j<jo j<jo 

for all t > 2, and the special case io = jo + N yields ( |6.10| ). The claim to be established is 
equivalent to proving that ( |6.11 ) holds for the case t = 2. Since the probabilities q^ = qij(2) 
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and q'ij = q[j{2) are explicitly known, verifying ( |6.11| ) is an easy calculation. The equivalence 
of the inequalities ( |6.11 ) to the existence of a coordinate-monotone probability rearrangement 



<j) is a two-dimensional majorization inequality, see Marshall and Olkin |To|. (One can also 
prove the claim by explicitly constructing a suitable mapping cj) rather easily.) 

The inequality J4.12 ) relating C B and £^ is proved in similar fashion. If qfj{t) is the 



probability that the pebbles are at (i, j) after t steps, then 

^ Yl 4® ' a11 io - 1 ' jo - 1 ' (6 - 12) 

j<jo j<jo 

for all t > 2. ■ 

Proof of Theorem 4.2. We let A(N) « 5(JV) mean A(iV) = i?(A r ) 1+ °( 1 ) as N -> oo. 

Consider first B - Let t B (M) denote the travel time for the chain which counts the 
number of transitions up to and including the transition at which the sum of the jumps of the 
chain exceeds M. Then for any fixed sequence of transitions 

t B (2N)>t NB >t B (2N-N). (6.13) 



Hence 



and 



P N <Nma* N | (l - Pvob[t B (2N)+j] \ (6.14) 



P^max^l-iy Prob[i~s(2iV - B) = j]j . (6.15) 
It's easy to check that 

Q ~ N '- = i<t<N {(/"^y Prob[i B (2iV)=j]j 

« max j (l - iy Prob[t B (2iV - B) + j] \ (6.16) 



using Pvob[t B (2N — B) > j] > Prob[t B (2N) > j + B\. It suffices to asymptotically estimate 
Qjj. One has 
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(Here j ~ 7iV and B~' yN arises as (l — tj) 7 ^ {B — 1)~~ /N .) Using Stirling's formula, one 
obtains ~ exp(^a^N) where \ct^ is the optimal value of the constrained maximization 
problem (M~) given by: 

B-l 

maximize Z = — 7 log B + 7 log 7 — 7, log 7$ 

i=l 



subject to 

B-l 



Y, ili = 2 , (6.18) 



i=i 
B-l 



£ 7i = 7 , (6.19) 



i=l 

7i > for 1 < i < B - 1 . (6.20) 

Introducing Lagrange multipliers Ai, A2 for the two equality constraints, and setting 

B-l /B-i \ /b-i \ 

G = 7 log B + 7 log 7 - ^ 7^ log 7^ - Ai I ^ iji - 2 j - A 2 I ^ 7i - 7 I • 

i=i \i=i J \i=i J 

Necessary conditions for an interior extremal are: 

dG 

— = i + l g7-logS + A 2 = , (6.21) 
dG 

— = -l-log 7i -iAi-A 2 = 0. (6.22) 
<?7i 

These yield 

7 = £exp(A 2 - 1) , (6.23) 

ji = exp(A 2 - 1) exp(iAi) , 1 < i < B - 1. (6.24) 
Substituting these expressions into (|6.19|) and cancelling exp(A 2 — 1) from both sides yields 

B-l 

^ exp(iAi) = B . (6.26) 

i=l 



Substituting the values above into (|6.21[) and using this formula yields 

2 

Y,f=i ^exp(iAi 

Now (|6.23|) and become 



exp(A 2 - 1) = — (6.27) 



2B 



7 



EiLi liex p(^i 
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7i 



2iexp(iXi 



Using these formulas the objective function value Z is evaluated (with F = i exp(iAi)) 



MS 



2B 



2B 



F 



B-l 



2exp(iAi) . / 2exp(iAi) 
log 1 



i=l 

2Ai 

— iexp(iAi) 



F 



log — + iAi 



F 



-2Ai 



(6.28) 



i=i 



Combining this with fl6.26j ) gives ( 4.14 ) for B with given by ( [4.15| ), provided the 

maximum of (M~) occurs at an interior point where all ji > 0. We omit the details of 
checking that boundary extremals having some 7, = do not give the absolute maximum in 
(M-). 

The case of P£ B is handled by analogous arguments. One reduces it to solving the con- 
strained maximization problem (M + ) given by: 

B-l 



maximize Z = —7 log B + 7 log 7 — 7^ log ' 



subject to 



(6.29) 

(6.30) 
(6.31) 



B-l 

^(i + lhi = 2, 

t=l 

B-l 

7i = 7 , 

i=l 

7i > for 1 < i < B . 
Again Z = — 2Ai at the extremal point, and Ai is determined by 

B-l 

^ exp(z + l)Aj = B , 
i=l 

which is ( 4,16| ). 

The asymptotic formulae ( |4.17| ) and ( |4.18[ ) are obtained from the formulas ( [4.15 ) and ( 4.16] ), 
by setting exp(^a) = 1 + 

W + W + °( B ' 4 ) , m which <5 is an unknown to be determined. We 

find it by noting that 



< <xp&a) - I ! / ! ^ I — j + + O 1 B~ :i ( I + 



, 2 -/ - -\B 2 ' B 3 
and solving for the appropriate value of S 



B 
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